fix: inject cache_control on content blocks for openai-compatible proxies to Anthropic backends (Bifrost, LiteLLM, Databricks)#25985
Conversation
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
…rock proxies
When setCacheKey: true is set on an @ai-sdk/openai-compatible provider and the
model ID contains 'bedrock/', or when cacheStrategy: 'bedrock' is explicitly
set, OpenCode now injects cache_control: {type:'ephemeral'} onto message content
blocks instead of sending a promptCacheKey request option.
promptCacheKey is an OpenAI-native mechanism that Bifrost, LiteLLM, and other
proxies routing to AWS Bedrock/Anthropic ignore entirely. These proxies require
cache_control on individual content blocks (Anthropic-style), which they then
translate to the native backend caching format.
Key changes:
- applyCompatCaching(): new function that converts string system messages to
content block arrays and annotates the last block of system/user messages with
cache_control via providerOptions.openaiCompatible — matching what Bifrost and
LiteLLM expect on the wire
- Guards applyCaching() from running on @ai-sdk/openai-compatible models to
prevent the 'claude' model-id heuristic from triggering the wrong caching path
- Passes provider options (item.options) into ProviderTransform.message() so
setCacheKey / cacheStrategy are available at message-transform time
- Adds cacheStrategy: 'bedrock' option to provider config schema
- Docs: new section explaining caching for openai-compatible Bedrock proxies
ea6982e to
cc3b3b9
Compare
|
Since opening this PR, the underlying issue has been confirmed by two more users on different providers:
This makes it clear the issue affects a broad class of OpenAI-compatible proxies that route to Anthropic-capable backends — not just Bifrost/LiteLLM. The The fix is minimal and isolated to |
|
Hey @rekram1-node and @thdxr — would love to get a review on this when you have a moment. This fixes a caching issue for users routing Claude models through OpenAI-compatible proxies (Bifrost, LiteLLM, Databricks, Xiaomi Mimo) to Bedrock/Anthropic backends. The root cause: @rekram1-node — you just touched this area in #26276, so you likely have the most context right now. The fix lives entirely in The issue has been independently confirmed by users on Databricks and Xiaomi Mimo direct API (see #25984) — so this affects a broad class of OpenAI-compatible proxies, not just Bifrost/LiteLLM. |
|
Automated PR Cleanup Thank you for contributing to opencode. Due to the high volume of PRs from users and AI agents, we periodically close older PRs using automated criteria so maintainers can focus review time on the most active and community-supported contributions. This PR was closed because it matched the following cleanup criteria:
PRs created within the last month are not affected by this cleanup. If you believe this PR was closed incorrectly, or if you are still actively working on it, please leave a comment explaining why it should be reopened. A maintainer can review and reopen it if appropriate. Thanks again for taking the time to contribute. |
Issue for this PR
Closes #25984
Type of change
What does this PR do?
setCacheKey: trueon@ai-sdk/openai-compatibleproviders was causingpromptCacheKeyto be sent as a top-level request option. Bifrost and LiteLLM (which proxy to Bedrock/Anthropic) don't use this field — they requirecache_control: { type: "ephemeral" }on individual message content blocks, which they then translate to the backend's native caching format.The fix adds a new
applyCompatCaching()function intransform.tsthat:cache_controlon each block (message-level injection doesn't work because the SDK spreads it as a top-level field, not a block property)cache_controlmessage()when the provider is@ai-sdk/openai-compatibleand eithercacheStrategy: "bedrock"is set explicitly, orsetCacheKey: truewith a model ID containingbedrock/I also added a guard to stop
applyCaching()from running on@ai-sdk/openai-compatibleproviders, since themodel.id.includes("claude")heuristic there would have triggered the wrong path for Bifrost models.I understand why this works:
getOpenAIMetadata()in the AI SDK readsmessage.providerOptions?.openaiCompatibleand spreads it onto the serialized message/block objects. So putting{ cache_control: { type: "ephemeral" } }underproviderOptions.openaiCompatibleon a content block means it lands on the wire as{ type: "text", text: "...", cache_control: { type: "ephemeral" } }, which is exactly what Bifrost/LiteLLM expect.How did you verify your code works?
packages/opencode/test/provider/transform.test.tscovering: string system → content block conversion, user block annotation, auto-trigger viabedrock/model ID, negative cases (no opts, non-bedrock model), and multi-part user messages. All 155 tests pass.bun typecheckfrompackages/opencode— no errors.localhost:24242routing tobedrock/global.anthropic.claude-sonnet-4-6. Inspected outgoing requests and confirmedcache_control: { type: "ephemeral" }appears on content blocks.Screenshots / recordings
No UI changes.
Checklist